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Abstract 

A summary is given of the dynamic- 
optimization approach to speedup learning 
for logic programs. The problem is to re- 
structure a recursive program into an equiv- 
alent program whose expected performance 
is optimal for an unknown but fixed popu- 
lation of problem instances. We define the 
term “optimal” relative to the source of in- 
put instances and sketch an algorithm that 
can come within a logarithmic factor of op- 
timal with high probability. Finally we show 
that finding high-utility unfolding operations 
(such as EBG) can be reduced to clause re- 
ordering. 

Purpose 

This paper presents an outline of the motivation, prob- 
lem, methods, and results of some recent work on 
dynamic optimization of programs. An earlier paper 
(Laird, 1992b) contains details, experimental results, 
and more complete references to related work. In ad- 
dition, this paper discusses some new results, partic- 
ularly the efficient handling of unfolding transforma- 
tions. 

Dynamic Optimization 

“Speedup learning” refers generally to the problem of 
learning to perform more efficiently with practice (a 
form of skill learning). A particular case of speedup 
learning, called dynamic optimization , is the prob- 
lem of improving the average-case performance of a 
program without affecting its the correctness. Un- 
like static optimization, dynamic optimization requires 
sample runs or other experience with the distribution 
of problem instances to be solved. 

One approach — known variously as memoization or 
caching — is to retain results of solved problems for sub- 
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sequent reuse in order to reduce the amount of redun- 
dant computation. Explanation-based generalization 
is a familiar example of this method. Another ap- 
proach is to formulate and refine useful rules about a 
search process. SOAR and PRODIGY have success- 
fully exploited this idea. 

The approach taken here, however, is different: the 
structure of the program is modified in order to im- 
prove the average-case performance; and instead of 
making a priori assumptions about the “average case”, 
we learn what we need to know about it by statistical 
sampling. 

As a generic task, automated program speedup has 
great potential for commercial return. Data processing 
programs are very complex because they must handle 
correctly every contingency, but most of these con- 
tingencies, occur rarely, if ever, during the lifetime of 
the program. Restructuring the program so that it 
runs fastest on the kinds of problems that it encoun- 
ters most often is a sensible approach, one that is best 
accomplished by automation: not only is restructuring 
code a difficult and error-prone process, humans users 
can seldom supply more than qualitative understand- 
ing of the statistical properties of the data that the 
program will be processing. 

Formal Problem Definition 

The goal of this work is to find practical optimiza- 
tion methods, not just to prove abstract or asymptotic 
properties of the problem. Still, a formal statement of 
the dynamic-optimization problem is useful. 

Fix a computational language £ . Let r be a fam- 
ily of correctness-preserving program transformations 
for programs over £. We are given as input three 
things: (1) a program P in £; (2) an unknown stochas- 
tic process S that can be invoked to generate prob- 
lem instances for the program; and (3) a computable 
cost function C that assigns a real- valued cost to any 
computation. 1 In cases where there may be multiple 

l We also need some reasonability conditions on the cost 
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solutions, the cost of running a program on an input 
instance is taken to be the cost of finding the first so- 
lution. The task is to find a program P f such that 

(1) P* is equivalent to P in the sense that there is a 
finite sequence of transformations in r that maps P 
into P\ and (2) P f is optimal with respect to C and «S, 
i.e., the expected cost of solving problems drawn from 
S is minimal (as measured by C) among all programs 
equivalent to P . 

For concreteness we apply our work to the Prolog 
language 2 with three transformations (defined below): 
predicate unrolling, clause reordering, and unfolding of 
pairs of clauses. The cost functions I used in the exper- 
iments were CPU time and and the number of atomic 
unifications. The problem instances have been inde- 
pendently selected with replacement from some fixed 
distribution, often with unbounded support (i.e., the 
number of possible problems is infinite). 

The transformations we admit are as follows: (see Fig- 
ure 1) 

• (Unrolling) Copy all clauses of a predicate p and 
assign the predicate a new name, e.g., pcopy. 
Some references to the old predicate may be 
changed to the new name. In practice, we shall 
change all references in the tails of the unrolled 
clauses to refer to the new name. 

• (Reordering) Reorder the clauses of a particular 
predicate. 

• (Unfolding) Unfold two clauses C\ and C 2 by re- 
solving a premise goal from C 2 with the head of 
Ci. The result is a new clause to be added to the 
program. 3 

These are not the only possible semantics-preserving 
transformations for logic programs, but they are suf- 
ficient to obtain good results in practice and simple 
enough to understand mathematically. 

Properly, the dynamic optimization problem defined 
above is ill-posed if one can construct a Prolog pro- 
gram and an input source such that there exists an 
infinite sequence of equivalent programs each of which 
has lower expected cost than its predecessor. In fact 
one can do just this. But in practice this technical- 
ity is removable by requiring only that we construct a 
program whose expected cost is, with high probability, 
within e of optimum, for arbitrary e > 0. 

More than a technicality is the fact that the (corre- 
sponding decision) problem is NP-hard. Even if we 

function, e.g., if one computation extends another, the 
cost increases. As a minimum the cost should be a Blum 
measure. 

2 In this paper we treat only “pure” Prolog, but the 
techniques extend to most impure constructs as well. 

3 The standard EBG algorithm unfolds the program 
through all the resolution steps used in solving a single 
problem instance (4). 


Note: In the programs below only the predicates are 
shown, without their arguments — e.g., p instead of 
p(...). 

(a) Initial program: 

[Cl] : p <- p. 

[C2] : p. 

(b) After an unrolling: (Clauses C3 and C 4 are copies 
of Ci and C 2 , resp., but with the predicate re- 
named. 

[C*l] : p <- pcopy. 

[02] : p. 

[03] : pcopy <- pcopy . 

[04] : pcopy . 

(c) After reordering the pcopy predicate: 

[C'l] : p <- pcopy. 

[02] : p. 

[04] : pcopy . 

[03] : pcopy <- pcopy . 

(d) Let 9 be a unifier for the underscored literals 
above; after unfolding C[ through C4, we obtain: 

[0*1.1]: p 0 . 

[02] : p. 

[0*1.2] p <- pcopyrest. 

[04] : pcopy . 

[03] : pcopy <- pcopy . 

[03. 1] : pcopyrest <- pcopy. 

Here, C3.1 is a copy of C3; note that the case C4 is 
covered by the unfolded clause C[ 

Figure 1: Basic Prolog transformations 

restrict the transformations to clause reordering, the 
problem of determining whether or not there is a re- 
ordering such that the expected cost is no greater than 
C is NP-complete, as demonstrated by a reduction 
from the minimal set cover problem. We are, there- 
fore, unlikely to find a polynomial time solution to 
the problem, but we are certainly free to look for a 
polynomial-time approximation algorithm with some 
performance guarantee. Indeed, this is our approach. 

The Learn/ Optimize Cycle 

Our main point of departure from the caching ap- 
proach is to separate the learning phase from the pro- 
gram transformation phase. A similar approach has 
been employed by Greiner and Orponen (1991) for 
non-recursive query languages. During the learning 
phase one collects statistics about the probabilities and 
costs of success and failure of specific clauses at spe- 
cific points (contexts) in the proof. The transforma- 
tion phase uses these statistics to select transforma- 
tions that are likely to improve performance. Apply- 
ing these transformations results in a revised program 
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P* equivalent to the original but with lower expected 
cost for the same source of problem instances. 

Ours is not a one-shot learning and transforming pro- 
cess, however: we repeat this learn/optimize process 
starting with P\ deriving yet another optimized pro- 
gram P", and so forth. To see why this is necessary, 
imagine that we subject the program in Figure 1 (b) 
to the learn cycle and thereupon decide to reorder the 
clauses C[ and C 2 . As a result of this change the ex- 
pected costs and probabilities for C3 and C 4 will also 
change; hence any decision about the optimal order of 
C3 and C 4 should be deferred until C[ and C 2 have 
been reordered and the statistics revised. 

In general, the cycle of learning and optimizing is re- 
peated, with transformations occurring at successively 
deeper levels of the program, until some kind of conver- 
gence is achieved. Unrolling is followed by reordering 
or unfolding in order to effect optimizations at specific 
points in the computation. As the program is unrolled, 
the total number of clauses increases, but in practice 
the physical size of the program is a negligible part 
of the run time. Note that unrolling has the effect 
of separating out a finite, non-recursive initial part of 
the computation from the later, recursive parts. For 
example: 

• In the transformation from Figure 1(a) to 1(b), 
the initial call (to p) is distinguished from the re- 
cursive calls to pcopy. Thus our learning phase 
can gather different statistics for the initial pro- 
gram call to p and for the subsequent recursive 
calls to pcopy. 

• If, in Figure 1(a) the subgoal p from clause C\ 
is solved more efficiently using a different clause 
order (C 2 before C\) from that of the main goal 
p, then the program will be unrolled as in Figure 
1 (b) and reordered as in Figure 1 (c). 

An unfolding transformation may increase by one the 
number of clauses in a procedure (e.g., p in Figure 
l(c-d)). The “catchall” clause (C[ 2 ) invokes a differ- 
ent version of the unfolded predicate, one that omits 
the clause that was unfolded (here, C4 is omitted in 
pcopyrest). 

Unrolling the program in this way is another distin- 
guishing feature of our approach. A reasonable ques- 
tion is: instead of unrolling, why not just optimize the 
program by reordering the existing clauses and un- 
folding some of them for some number of steps? The 
reason is that advice obtained early in a long search 
has exponentially greater benefit than advice obtained 
later, since more of the search space will be avoided. 
After initial experiments in which no unrolling was 
done, I found that the additional steps of unrolling 
and optimizing the initial calls in the program pro- 
vided substantially greater improvement. 

To summarize, the learn/optimize cycle gradually un- 
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rolls the program and optimizes it for calls first at 
depth 0 , then at depth 1, and so on. If on some cy- 
cle no optimizations can be found and depth d, the 
cycle still continues by optimizing depth d - hi. The 
procedure halts, not because it runs out of transfor- 
mations, but because of finite limits on the accuracy 
of the learning algorithm. At this point the final op- 
timization is to reorder the clauses of the recursive 
procedures (without unrolling). 

Below, we discuss several specific issues concerning the 
learn/optimize cycle. 

Well-foundedness 

A problem can arise when some some of the clauses 
of a procedure overlap in the cases that they cover, 
i.e., more than one clause can successfully resolve cer- 
tain goals. Suppose, for example, that in Figure 1(b) 
C 3 and C 4 each cover almost all provable instances of 
the pcopy predicate, with equal expected cost. Then 
with the clause order shown, we will charge C3 with 
almost all the cost of solving pcopy goals. C3 thus 
appears to be more expensive than C 4 , and the trans- 
formation phase will therefore place C 4 before C3 in 
the revised program. But after reordering and repeat- 
ing the learning phase, C 4 is now charged for the cost 
of pcopy, and the original order appears preferable. 
By continuing to reverse the order of these two clauses 
we could get stuck in a loop that never improves the 
program. As noted above, finding an optimal ordering 
of a set of clauses is NP-complete, so no exact algo- 
rithm is likely to be significantly faster than testing 
every possible ordering. 

A simple solution to this problem is that once the 
clause order of a predicate is chosen, we should never 
change it again, even if subsequent learn/optimize cy- 
cles make a different order seem better. It turns 
out that this “greedy” procedure for clause reordering 
yields an approximation whose expected cost is within 
a logarithmic factor of optimal 

What about unfolding transformations? For these 
there is no inverse folding transformation in the admis- 
sible set, so once carried out they cannot be undone. 
However the order of the unfolded clauses (e.g., C[ v 
C[ 2 , and C 2 in Figure 1(d)) may be changed on the 
next cycle. Below we shall show that the question of 
whether the statistics can mislead us into performing a 
sub-optimal unfolding transformation can be finessed, 
and the search for unfolding transformations reduces 
to one of finding good clause orderings. 

Convergence 

As discussed above, it may happen that as a result 
of a transformation the measured expected cost of the 
program actually increases on the next cycle; even so, 
we continue the cycle even after a temporary increase, 
because transformations on subsequent cycles can re- 


duce the mean cost of the program substantially. Con- 
sequently, this is not a hill-climbing procedure. 

If the learning algorithm were to continue to unroll 
clauses and collect statistics on them to arbitrary ac- 
curacy, the learn/optimize cycle might never halt: ever 
deeper transformations might be found that continue 
reducing the expected cost of the program, ultimately 
by very small amounts. In practice, however, it is the 
learning algorithm, specifically its finite sample-size 
limits, that bounds the cycle. The deeper one goes 
in the search tree, the lower the probability that the 
nodes will be encountered on any given problem, and 
hence the larger will be the number of instances that 
must be solved in order to estimate the likelihoods and 
costs accurately. 

Slightly more formally, one can characterize the pro- 
gram to which the process converges by defining a 
program to be d-optimized if all its calls at depths d 
and below are optimal. Then if the learning algorithm 
provides correct statistics (in the PAC sense), the op- 
timized program will be (probably approximately) d- 
optimized to some depth d. 

Order of Transformations 

Suppose that for a subgoal at some point in the pro- 
gram, an unfolding transformation and a reordering 
tranformation would each be effective, according to the 
results of the most recent learning phase. Which trans- 
formation^) should we perform? My intuition tells 
me that the clauses should be reordered before any of 
them should be unfolded with subsequent clauses, but 
initially I had difficulty justifying this fact. Below I’ll 
argue why it is in fact true. 

The Learning Algorithm 

The statistics collected during the learning phase are 
driven by the need to predict the efficiency of program 
transformations during the subsequent phase. Con- 
sider the predicate p in Figure 1(b) and the decision 
about the ordering of the clauses C[ and C 2 . The 
solution to this problem is well known: we need to es- 
timate for each clause the a priori probability p, that 
the clause will succeed (independently of whether the 
other clause leads to a solution) and of the expected 
cost Ci of applying the clause (regardless of success or 
failure). Then if we change the program by placing 
these clauses in decreasing order of p,/Ci, the order- 
ing of the p clauses is optimal — provided no two clauses 
cover (solve) the same problem instance. If multiple 
clauses cover some instances, we still adopt the same 
ordering : although this may not be an optimal order- 
ing, it is (as noted above) a greedy approximation to 
optimal. On subsequent passes of the cycle, the or- 
dering of the clauses C[ and C 2 will not be altered, 
so statistics on them need not be collected. Instead, 
clause C 3 will be unrolled ( C 4 has no subgoals to un- 
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roll) and statistics pi and C, will be collected for clauses 
C 3 and C 4 . 

Efficient statistical methods for estimating p and C 
for a clause to within any given accuracy and confi- 
dence are well known, and a sufficient sample size is 
easily computed. For some clauses, p may be fall be- 
low the accuracy limit and thus be estimated as zero: 
such clauses are not optimized further. Typically the 
deeper one gets in the program (i.e., the more unrolling 
transformations have been performed), the smaller the 
likelihood that any particular problem instance will 
invoke that clause, and the larger the sample size re- 
quired to collect the needed statistics. Consequently 
we impose a lower bound on the absolute likelihood 
that a clause will be invoked and refuse to optimize 
clauses whose likelihood falls below that threshold. It 
turns out (Laird, 1992b) that this lower bound is not 
enough to guarantee termination of the cycle: we must 
also impose an absolute upper bound on the depth 
of the unrolling. In practice, however, this absolute 
depth will probably not be reached. 

Finally we are left with optimizing the recursive 
clauses (like pcopy in Figure 1 (d)). Unlike the clauses 
that precede it in the computation, such clauses are 
invoked multiple times. A Markov-tree learning algo- 
rithm (called the TDAG algorithm) is used to com- 
pute the probabilities of success for these recursive 
procedures. Unfortunately there is no polynomial 
time bound on the sample size needed to estimate the 
probabilities for such clauses, since for Markov pro- 
cesses successive events are not statistically indepen- 
dent. The sample size needed to ensure the accuracy 
of the statistics is highly dependent upon the struc- 
ture of the underlying call graph and the transition 
probabilities determined by the input source. As a 
practical approach, I have determined this sample size 
heuristically, with good results. 

Finding Unfolding Transformations 

We have seen that the necessary statistics and the 
corresponding procedure for selecting good clause- 
reordering transformations are straightforward, and 
that we can quantify the relationship between the 
optimal order and our approximation to it. What 
about unfolding transformations? In a previous pa- 
per (Laird, 1992a) I outlined a rather complicated 
algorithm for finding unfolding transformations, with 
special statistics collected for the purpose during the 
learning phase. But it turns out that virtually the 
same algorithm as for reordering transformations ap- 
plies. In fact, unfolding can be viewed as a special case 
of reordering. 

The simple program in Figure 1 (c) represents the SLD 
search procedure for resolving goals with predicate p. 
In Figure 2 a portion of the depth-first search tree for 
this program is shown explicitly. The SLD-resolution 
procedure searches this tree in pre-order left- to- right. 



?- p 



C'l-C4 C'1.C3 



C'l-C3-C4 C / 1*C3*C3 



(etc. ) 

Figure 2: Search tree for the program of Fig. 1(c). 


?- p 



(etc. ) 


Figure 3: Reordered Search tree. 


[C'1C4]: p 0 . 

[C2] : p. 

[C J 1C3]: p (j> <- pcopy <£. 

[C3] : pcopy <- pcopy . 

[C4] : pcopy . 

Figure 4: The program of Fig. 1(b) reordered to depth 
two, according to Fig. 3. The substitution 6 is that 
shown in Fig. 1(d), unifying the pcopy consequent of 
clause C[ with the head of C4. The substitution <{> 
unifies the pcopy term in clause C[ with the head of 
clause C3. 


In the figure each node is labeled with the sequence of 
clauses used to resolve subgoals. For example, to reach 
the node labeled 0*1-03, one resolves the p-goal using 
clause C [ , tries to resolve the resulting pcopy-subgoal 
using clause C4 but fails, backtracks, and tries again 
using clause C 3 . 

The order of the nodes C[ and C2 at depth 1 is estab- 
lished, as described above, by placing them in order 
left-to-right with decreasing values of p/C . The or- 
der of the clauses C 3 and C4 below C[ is determined 
similarly. 

Note that the option to consider node C2 after node 
C*l* C4 but before 0*1- C3 is not available if we may 
only reorder clauses for a single predicate. It is, how- 
ever, if we consider reordering all the nodes at depths 
< 2— i.e., the nodes C'l- C4, C>1- C3, and C2. To do 
so we need to estimate the p/C values for the pairs of 
steps that lead to the depth-two nodes — e.g, we need 
to find the likelihood that a successful solution will be 
found using clause 0*1 followed by C4, in comparison 
to the other two possibilities of C'l followed by C3 
or C2. Suppose we do this and we discover that the 
p/C value for C2 is between that of the two nodes at 
depth two; then we should reorder the tree as in Figure 
3. The corresponding program is shown in Figure 4. 
Note that there are now three clauses for the predicate 
p, since there are three nodes of depth < 2 in the tree; 
the first and third clauses have each been unfolded one 
level. 

This example illustrates how appropriate unfolding 
transformations can be found using the same learning 
statistics as for reordering. Actually, there is a mi- 
nor difference between reordering one-step nodes and 
two-step nodes: in the event of failure, the program 
of Figure 4 must unify the p goal with the head of a 
p-clause three times, instead of two in Figure 1(c); this 
small additional expense can be estimated and taken 
into account in the decision about whether to change 
the order of the tree in Figure 2 to that of Figure 3. 

Although we have treated only a special case, the gen- 
eral procedure for determining unfolding transforma- 
tions is the same: Determine the best ordering for pairs 
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of resolution steps instead of single steps; if in the re- 
sulting order a clause at the lower level has only one 
immediate child node (e.g., the nodes labeled C'l in 
Figure 4), go ahead and unfold. By iterating this pro- 
cedure during the learn/optimize cycle, one can end 
up unfolding multiple steps into a single clause, with- 
out the need to estimate statistics for more than two 
steps at a time. 

Conclusion: Where do we go from here? 

The work described here builds upon and extends the 
work of many researchers, including Smith and Gene- 
sereth (1985), Prieditis and Mostow (1987), Gooley 
and Wah (1989) , and Greiner and Orponen (1991). 
The principal contributions are as follows: 

• We have shown that recursive programs can be 
dynamically optimized efficiently and effectively. 
Previous work has reported difficulties speeding 
up recursive programs. 

• The nature of that optimization can be quantified 
better than previous heuristic methods were able 
to. 

• We have shown how search control can be embed- 
ded in the program instead of being added on in 
the form of a time-consuming meta-theory that 
must be evaluated outside the actual program. 

• By averaging over several runs we have removed 
the dependency of the resulting optimized pro- 
gram on the order of the examples. This has been 
a consistent problem with caching methods. 

• We have integrated the utility estimates into the 
learning procedure: no transformations are even 
generated unless their utility justifies it with high 
probability. The learning data and the utility 
evaluation data are one and the same. 

• We have eliminated all ves- 
tiges of the “generalization-to-N” anomalies and 
tricky “operationally” decisions that occur with 
EBG-based methods. In fact, unfolding itself has 
been reduced to a minor transformation that oc- 
curs after consecutive pairs of clauses have been 
reordered. 

This work is still in progress. Future plans call for 
redoing the previous dynamic optimizer for Prolog to 
incorporate the improvements described here and to 
handle many of the so-called impure constructs in Pro- 
log, including negation-as-failure, call, and and or. 
Mathematical analysis of this approach is still incom- 
plete, and we cannot yet argue that a strategy different 
from unrolling and reordering will not provide superior 
optimization. 

Finally, for dynamic optimization of programs to be 
of commercial value, we must be able to optimize pro- 
grams written in commercial programming languages. 
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Prolog is not a commercial data- processing language 
and, in my estimation, is not likely to become one. 
The method described here does not apply to proce- 
dural languages like C and Fortran, where order of 
decisions cannot be changed. One could add non- 
deterministic search primitives to such languages, but 
such new constructs are unlikely to gain wide accep- 
tance. Constraint-logic programming, on the other 
hand, is growing in popularity as a programming lan- 
guage, and it is quite likely that these optimization 
methods are applicable. Also, very similar dynamic- 
optimization problems occur in database query lan- 
guages, where further opportunities for commercial- 
ization may be found. 
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